Fast Monte Carlo Algorithms for Matrices III: Computing a Compressed Approximate Matrix Decomposition
نویسندگان
چکیده
In many applications, the data consist of (or may be naturally formulated as) an m× n matrix A which may be stored on disk but which is too large to be read into random access memory (RAM) or to practically perform superlinear polynomial time computations on it. Two algorithms are presented which, when given an m×n matrix A, compute approximations to A which are the product of three smaller matrices, C, U , and R, each of which may be computed rapidly. Let A′ = CUR be the computed approximate decomposition; both algorithms have provable bounds for the error matrix A − A′. In the first algorithm, c columns of A and r rows of A are randomly chosen. If the m× c matrix C consists of those c columns of A (after appropriate rescaling) and the r×n matrix R consists of those r rows of A (also after appropriate rescaling), then the c× r matrix U may be calculated from C and R. For any matrix X, let ‖X‖F and ‖X‖2 denote its Frobenius norm and its spectral norm, respectively. It is proven that ∥∥A−A′∥∥ ξ ≤ min D:rank(D)≤k ‖A−D‖ξ + poly(k, 1/c) ‖A‖F holds in expectation and with high probability for both ξ = 2, F and for all k = 1, . . . , rank(A); thus by appropriate choice of k ∥∥A−A′∥∥ 2 ≤ ‖A‖F also holds in expectation and with high probability. This algorithm may be implemented without storing the matrix A in RAM, provided it can make two passes over the matrix stored in external memory and use O(m + n) additional RAM (assuming that c and r are constants, independent of the size of the input). The second algorithm is similar except that it approximates the matrix C by randomly sampling a constant number of rows of C. Thus, it has additional error but it can be implemented in three passes over the matrix using only constant additional RAM. To achieve an additional error (beyond the best rank-k approximation) that is at most ‖A‖F , both algorithms take time which is a low-degree polynomial in k, 1/ , and 1/δ, where δ > 0 is a failure probability; the first takes time linear in max(m,n) and the second takes time independent of m and n. The proofs for the error bounds make important use of matrix perturbation theory and previous work on approximating matrix multiplication and computing low-rank approximations to a matrix. The probability distribution over columns and rows and the rescaling are crucial features of the algorithms and must be chosen judiciously.
منابع مشابه
Fast Monte Carlo Algorithms for Matri es III: Computing a Compressed Approximate Matrix De omposition
متن کامل
Evaluating Performance of Algorithms in Lung IMRT: A Comparison of Monte Carlo, Pencil Beam, Superposition, Fast Superposition and Convolution Algorithms
Background: Inclusion of inhomogeneity corrections in intensity modulated small fields always makes conformal irradiation of lung tumor very complicated in accurate dose delivery.Objective: In the present study, the performance of five algorithms via Monte Carlo, Pencil Beam, Convolution, Fast Superposition and Superposition were evaluated in lung cancer Intensity Modulated Radiotherapy plannin...
متن کاملAn Experimental Evaluation of a Monte-Carlo Algorithm for Singular Value Decomposition
We demonstrate that an algorithm proposed by Drineas et. al. in [7] to approximate the singular vectors/values of a matrix A, is not only of theoretical interest but also a fast, viable alternative to traditional algorithms. The algorithm samples a small number of rows (or columns) of the matrix, scales them appropriately to form a small matrix S and computes the singular value decomposition (S...
متن کاملEvaluating Quasi-Monte Carlo (QMC) algorithms in blocks decomposition of de-trended
The length of equal minimal and maximal blocks has eected on logarithm-scale logarithm against sequential function on variance and bias of de-trended uctuation analysis, by using Quasi Monte Carlo(QMC) simulation and Cholesky decompositions, minimal block couple and maximal are founded which are minimum the summation of mean error square in Horest power.
متن کاملRandomized Matrix Decompositions using R
The singular value decomposition (SVD) is among the most ubiquitous matrix factorizations. Specifically, it is a cornerstone algorithm for data analysis, dimensionality reduction and data compression. However, despite modern computer power, massive datasets pose a computational challenge for traditional SVD algorithms. We present the R package rsvd, which enables the fast computation of the SVD...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- SIAM J. Comput.
دوره 36 شماره
صفحات -
تاریخ انتشار 2006